Method of error recovery in a data communication system

ABSTRACT

Described is a method of error recovery in a data communication system of the kind comprising two nodes connected by a serial link and wherein data is transmitted between the nodes in the form of packets of a predefined format. Each node receives data over an inbound line and transmits data over an outbound line. When an error is detected, both nodes enter a link check state, invoke a Link Error Recovery Procedure (ERP) and exchange status by means of Link Resets. Error recovery is performed separately for each line. Each node is responsible for recovering packets that were lost on its outbound line. In normal operation of the link, the transmitter does not reuse a packet buffer until it has received a response from the connected node indicating that the packet was correctly received. Therefore when an error occurs, the affected packets are still available for retransmission.

TECHNICAL FIELD OF THE INVENTION

This invention relates to the field of data communication and moreparticularly to a method for recovering from errors occurring duringtransmission of data between nodes in a digital system.

BACKGROUND OF THE INVENTION

Data communication systems and the elements making up data communicationsystems involve the electronic transmission of data over a link from onenode (e.g computer or terminal) of the system to another. To ensurecommunication in an orderly fashion over the data link, a uniform methodof sending and receiving information is required. This uniformity isachieved by means of a protocol (set of rules) used for the managementof a data link in the communication system. Protocols are used toperform such functions as establishing the conversation between twonodes in the communication system, identifying time sender and receiver,acknowledging received information and node initialisation. The exactprocedure and function performed depends on the protocol used. Data linkprotocols may be classified in two categories; bit oriented protocols(BOP) and byte oriented protocols.

Prior bit oriented protocols include the Synchronous Data Link Control(SDLC) protocol which was introduced by IBM in 1973 and the High LevelData Link Control (HDLC) protocol. All communications in a BOP systemare in the form of frames of uniform format which comprise a number offields each having a definite location and precise meaning. In HDLC aframe commonly starts with an eight bit flag sequence which is followedby ADDRESS and CONTROL fields after which may follow an INFORMATIONfield (depending on the function of the frame). The INFORMATION field isfollowed by a FRAME CHECK SEQUENCE field and the end of the frame isdelimited by another flag sequence. The Address and Control fields inHDLC each comprise a single octet of bits. The information field maycontain a variable number of bits, in the form of an integral number ofoctets, up to a predefined limit. The FCS field commonly comprises apair of octets.

In HDLC, the CONTROL field defines the function of the frame. There arethree basic types: Information, Supervisory and Unnumbered which arereferred to as I-frames, S-frames and U-frames. The I-frame is used toprovide for information transfer across the link and contains anINFORMATION field. The S-frame is used to perform supervisory functionson the link and may be used to acknowledge I-frames, or to requestretransmission of frames. The U-frame is used particularly in errorrecovery.

HDLC is often employed in systems wherein data communication is overrelatively long distances where there will be a number of data frames onthe link at any one time. The method of acknowledgement that data hasbeen received has to be capable of detecting the incorrect transmissionof any one of these data frames. An implied acknowledgement technique isused which enables frame acknowledgement information to be includedwithin an I-frame. This is accomplished by assigning identificationnumbers, called sequence numbers, to received and transmitted frames.These numbers contain information pertaining to the number of framestransmitted and received by the individual node. By checking thesenumbers, the node can compare the number of received frames with thenumber of transmitted frames and take the appropriate error recoveryaction if a discrepancy exists. Although the packet sequence numbersused in the described implied acknowledgment technique may be includedwithin an I-frame, if information frames are not being sent by the nodereceiving the data frame to be acknowledged, then it is necessary toinclude the sequence number information in a separate S-frame. Detailsof one subset of HDLC can be found in `X25 explained` by R J Deasington,published by Ellis Horwood Limited.

Errors occurring during data transmission between nodes are corrected indifferent ways depending on the type of error detected. If the receiverreceives a frame which is out of sequence, an S-frame with a reject(REJ) control field is sent by the receiver to the transmitter. Thisrequests retransmission of I-frames starting at the one after the lastframe that was correctly received. If an error is detected which cannotbe recovered by the retransmission of identical frames then the nodedetecting the error sends a U-frame which includes a copy of the controlfield of the frame which has been rejected and an indication of the typeof error encountered.

In bit oriented protocols, it is not necessary to operate in a send andhold mode whereby the transmitting node has to wait for acknowledgementthat a frame has been received before transmitting a subsequent frame.Thus Bit oriented protocols are operable in full duplex mode (two waysimultaneous communication). BOP systems may of course be operated inhalf duplex mode (two way alternate communication) though in half duplexmode, the advantages inherent in the protocol are not used.

One type of byte oriented protocol is BISYNC, in which information istransmitted in blocks consisting of one or two sync characters, anaddress, control characters, an information field and an error checkingcode. Special block control characters are used to manage the flow ofinformation over the link. In BISYNC it is necessary to ensure that theinformation field does not contain a bit sequence which corresponds toone of the control characters otherwise that bit sequence will beincorrectly interpreted by the system as a control character. BISYNC isan example of a send and hold protocol in which the transmitting nodehas to receive acknowledgement of a first block of data before it canbegin sending a second block. Accordingly, BISYNC in its basic form isnot able to operate in full duplex mode.

DISCLOSURE OF THE INVENTION

The invention provides, in one aspect, an error recovery method for usein a data communication system of the kind comprising two nodesconnected by a serial link, data being transferred between the nodes inpackets of a predefined format, in which method, each node monitors thesystem for errors and, if an error is detected, by one node, it sends amessage to the other node, the message including a sequence numberindicative of the last packet received by said one node and on receiptof said message the other node sends a message to said one nodeincluding a sequence number indicative of the last packet received bythat node, each ndoe determining from the message from the other nodethe number of packets, if any, that were not correctly received by theother node and retransmitting the missing packets.

In a preferred method, when the first node detects an error it enters alink check state and invokes a link error recovery procedure (ERP) whichcauses the first node to build a link status byte indicating the type oferror detected and to send the link status byte to the second node andwherein the second node on receipt of the link status byte identifies ifthe indicated error is of the type which may be recovered byretransmission of one or more packets of data and if so the second nodetransmits said packets to the first node.

Thus in the error recovery method of the present invention, the errorrecovery actions are symmetric in that each node is made aware of theother node's status during the Link Error Recovery procedure. Exchangeof this status between the nodes identifies which packets have to bere-transmitted.

It is necessary at this stage to clarify the meaning of the variousterms used in the present invention and their relation to theterminology employed in the description of the prior art. The term`packet` as used in the present invention is essentially equivalent tothe term `frame` as used in the prior art description of the HDLCprotocol. In addition the term `frame` used in the present invention isessentially equivalent to the term `octet` as used in the description ofHDLC.

The recovery of many types of error is transparent to the application.In the case of a link error i.e. when a packet sent by a node is notreceived correctly by the receiving node and is therefore notacknowledged by the receiving node, the data remains in a transmitbuffer in the transmitting node and is available for re-transmission.However if the detected error is of a type, e.g. hardware error, whichis probably not recoverable by packet retransmission, then theapplication which is communicating via the link is alerted. Theapplication then takes the necessary action to recover from the error.

According to another aspect of the invention, there is provided a datacommunication system including two nodes connected by a serial link overwhich data is transferred between the nodes in packets of a predefinedformat, error detection means in each node for detecting errors in thesystem, and transmission error recovery means in at least one noderesponsive either to detection of an error by the error detection meansof that node to cause that node to send an error message to the othernode including a sequence number indicative of the last packet receivedby said at least one node, or to receipt of an error message from theother node to cause said at least one node to send its error message tothe other node, each node being arranged to determine from the errormessage from the other node the number of packets, if any, that were notcorrectly received by the other node and to retransmit the missingpackets.

The ERP used in the present invention can be implemented in software, orbeing systematic, in hardware if performance is critical.

A preferred embodiment of the invention will now be described by way ofexample only with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE INVENTION

FIG. 1 shows a schematic diagram of the main components of a node tonode data link configuration according to the present invention;

FIGS. 2a and 2b are schematic diagrams showing the components of FIG. 1in detail;

FIGS. 3a and 3b are state diagrams for the transmitter packet andcontrol FSMs as employed in the present invention;

FIGS. 4a and 4b are state diagrams showing the state transitions of thereceiver packet and control FSMs as used in one embodiment of thepresent invention;

FIG. 5 shows the format of a data packet by means of which data istransmitted;

FIG. 6 shows the format of the link status byte employed in the presentinvention;

FIG. 7 is a flow diagram showing the steps involved in recovering from acorrupted ACK response;

FIG. 8 shows the link hardware connected via a microprocessor to a databuffer.

DETAILED DESCRIPTION OF THE INVENTION

A glossary of terms used in the following description can be found inattached Table I.

FIG. 1 shows two nodes (node 1 and node 2) each of which has anassociated inbound 7 and outbound 5 link. Each link controls thetransmission or receipt of data to and from the connected node. Data tobe transmitted or that has been received is held in outbound and inboundpacket buffers 11 respectively. Each packet buffer has associated withit a packet status register 14 in which some of the information requiredfor the transmission or receipt of data is held.

Data is transmitted between the nodes in the form of packets of apredefined format, The control of the flow of the data packets ismanaged by means of control frames. Details of the data packets andcontrol frames will now be described.

There are two basic types of frame employed: DATA frames and PROTOCOLframes. In the embodiment described herein there are 256 data frames and4 protocol frames. The protocol frames are used to delimit packets andto provide flow control.

FIG. 5 shows the packet format which is used for the transmission ofdata over the link, A packet consists of a sequence of at least 4 dataframes that is delimited at both ends by a FLAG frame (described below).A packet is divided into a sequence of 3 or 4 fields as follows:

Control field. (1 frame, always present.)

Address field. (1 frame, always present.)

Data field. (Variable length, optionally present.)

CRC field. (2 frames, always present.)

The shortest possible packet, with no data field, contains 4 dataframes. If a node receives a packet containing less than 4 frames thenit will indicate a protocol error.

CONTROL FIELD: The control field is the first data frame following aFLAG. When received by the receiving node and after decoding (moredetail of which is given later), the resulting byte is interpreted asfollows: ##STR1##

User Defined: these are spare bits and may be used for any purpose thatthe using system requires.

Link Reset & Total Reset: These bits are used in the error recoveryprocedure which is associated with the communication method describedherein. Some details of the error recovery procedure are described inthe present application but more specific detail of the associated errorrecovery procedure can be found in a concurrently filed applicationentitled `Method of Error Recovery` filed in the same name as thepresent application.

Packet Sequence No: These 2 bits are used to protect against lost orduplicate packets. They are incremented modulo 4 by the transmitter ineach successive packet and checked by the receiver.

ADDRESS FIELD: This is a single data frame that immediately follows thecontrol field. It normally contains the encoded destination address ofthe packet within the remote node.

DATA FIELD: The data field is optional. If present it consists of avariable number of data frames that follow the address field. Thecontent of the data field is controlled entirely by the application andit is of no relevance to the architecture of the link. The maximumlength of the data field is implementation dependent and it depends on(i) the size of the available packet buffers, (ii) the sustained datarate that is required and (iii) the acceptable error rate given thesystem environment and the defined CRC polynomial.

Some implementations may have further restrictions, eg. the length ofthe data field must be an even number of frames. In such animplementation, if a node receives a packet with an incorrect length forthe data field then it rejects the packet.

CRC FIELD: The CRC field consists of 2 data frames that immediatelyprecede the trailing FLAG. It is used to check the control, address anddata fields. The destination does not regard any of the fields as validuntil the CRC field has been received and checked by the receiver.

For each packet of data, the CRC field is calculated by the outbound CRCgenerator, in a 16 bit register, using the following polynomial:

    X.sup.16 +X.sup.15 +X.sup.2 +1.

The CRC register is preset to all ones at the start of each packet.

The inbound CRC accumulator in the inbound serial link decodes the CRCfield and checks it using the same polynomial as described above. TheCRC register is preset to all ones at the start of each packet andaccumulated over the control, address and data fields. At the end ofaccumulation, provided that the incoming packet was received withouterror, the CRC register in the inbound accumulator should contain allzeros.

One type of protocol frame defined in the communication method of thepresent invention is a FLAG frame which is used to delimit a packet. Thetransition from a FLAG frame to a data frame marks the start of packet,and the transition from a data frame to a FLAG frame marks the end of apacket. These will be referred to later in the description as theleading FLAG and the trailing FLAG respectively.

To minimise overheads a trailing FLAG call also be the leading FLAG ofthe next packet. Thus consecutive packets are separated by a minimum ofone FLAG. The bit pattern for the FLAG frame has been chosen such thatit does not occur at any bit position in an arbitrary sequence of anyother valid frames. An example bit pattern for this and other types ofprotocol frame is described below. The FLAG frame also serves theadditional purpose of providing frame synchronisation. In addition, FLAGframes are sent when the link is idle in order to maintainsynchronisation at the receiving end of the link.

The communication method and system of the present invention provide themeans for transmitting data in the form of the above described packetsfrom a source to a destination node. To implement the necessary flowcontrol, the destination sends the source two responses for each packet:

ACKNOWLEDGEMENT A pair of consecutive ACK protocol frames

RECEIVER READY A pair of consecutive RR protocol frames.

The control frames are preferably used in pairs to protect the responsesfrom being manufactured by transmission errors. A node only acts on aresponse when it has received both frames of the pair without any otherintervening frames.

In full duplex operation a node may wish to send a response for areceived packet whilst it is in the middle of transmitting anotherpacket. In this case the transmitter gives priority to the response andinterleaves it within the packet. This scheme miniraises latency and itallows the maximum link thoughput to be achieved with only 2 packetbuffers in each transmitter and each receiver.

Since responses consist of control frames the receiver can easilyseparate them from the data frames that: make up a packet. The CRC fieldfor a packet does NOT include any interleaved response frames.

ACKNOWLEDGEMENTS: The communication method of the present inventionrequires a node to acknowledge every valid received packet. A packet isvalid if it does not contain any of the `receiver errors` listed in thelink status byte. The destination transmits an ACK response when itreceives a valid packet. When the source receives the ACK response, theportion of the outbound data buffer which contained the informationmaking up the acknowledged packet may be cleared ready for the input ofnew data to be transmitted.

Each node has two associated conditions, waiting for ACK and `ACKpending`. How these conditions control acknowledgements is nextdescribed with reference to FIG. 4:

1. When a node enters the `ready` state (the possible states of the linkare described in more detail below) it clears `waiting for ACK` and `ACKpending`.

2. A node sets `waiting for ACK` 2 frame periods after it finishestransmitting the trailing FLAG of any packet. A node resets `waiting forACK` when it receives an ACK response. The corresponding outbound packetbuffer may then be deallocated and filled by another packet.

If a node does not receive the first ACK of the response within apredefined time (e.g. 10 micro seconds) after setting `waiting for ACK`for a packet then it recognises an ACK time out.

If a node is still `waiting for ACK` widen it finishes transmitting theCRC field of the next packet then it does not transmit the trailingFLAG. Instead it sends NUL frames until either the ACK response isreceived or an ACK time out occurs. If an ACK time out does occur inthis state then the node must send an illegal frame followed by FLAG's.The illegal frame aborts the packet and ensures that it is rejected bythe remote node.

This protocol guarantees that the transmitter can always associate eachACK response unambiguously with the corresponding packet independentlyof propagation delays, the transmission speed and the packet length.

If a node receives an ACK control frame when it is not `waiting for ACK`or it receives only a single ACK frame, then it recognises a protocolerror.

3. A node sets `ACK pending` immediately when it receives the trailingFLAG of a valid packet when the node is in the `ready` state.

4. When `ACK pending` is set the node must transmit an ACK response assoon as possible, However if an RR response is in progress then it mustbe completed first. `ACK pending` is reset when the ACK response hasbeen transmitted.

PACING

Pacing ensures that the transmitter does not overrun the availablebuffers in the receiver. The unit of pacing is a packet. The method ofcommunication of the present invention requires a receiver to have onlyone buffer, although at least 2 buffers are generally required toachieve continuous (full duplex) operation of the link.

Each node has two conditions that control pacing, `waiting for RR` and`RR pending`:

1. When a node enters the `ready` state it sets `waiting for RR` and `RRpending`. Consequently it will send an RR response immediately and itwill not send any packets until it receives an RR response.

2. A node may only start to send a packet when either of the followingconditions is satisfied:

The node is in the `ready` state and it is not `waiting for RR`.

The node is in the `check` state and the packet control field willspecify Link Reset or Total Reset. `Waiting for RR` is set when a nodetransmits the control field of any packet and reset when it receives anRR response.

3. When all of the following conditions are satisfied a node transmitsan RR response immediately after the cur rent frame:

The node is in the `ready` state and `RR pending` is set.

At least one inbound buffer is available to receive a packet in additionto the packet currently being received, if any.

The node is not currently transmitting an ACK response and `ACK pending`is not set.

`RR pending` is set when a node receives the control field of anypacket, including invalid packets. It is reset when the node transmitsan RR response.

PACKET SEQUENCE NUMBERS are employed to protect against packets beinglost or duplicated by a transmission error. For example, if a FLAG iscorrupted then two packets may be merged into one. A packet could beduplicated if a transmission error corrupts an ACK response. To guardagainst this the Link ERP needs to know whether the corresponding packethas actually been received by the destination.

The control field of each packet contains a 2 bit Packet Sequence Number(PSN). In normal operation the PSN increments modulo 4 in eachsuccessive packet.

Each node maintains a 2 bit Transmit Sequence Number (TSN) and it copiesthis into the PSN of each packet sent. The TSN is reset to `00`B in the`disabled` state and it is incremented modulo 4 for each packettransmitted, regardless of any response received.

Each node also maintains a 2 bit Receive Sequence Number (RSN). This isreset to `00`B in the `disabled state` and it is incremented modulo 4only when the receiver accepts a packet, ie. when it returns an ACKresponse. However the RSN must not be incremented by Link Resets. When apacket is received the hardware checks the PSN against the RSN asfollows:

If PSN=RSN then the sequence is correct and the receiver has receivedthe packet it was expecting. Providing that there is no other error thenthe packet is accepted and an ACK response is returned.

If PSN is not equal to RSN and the packet does not specify Link Reset orTotal Reset then one or more packets have been lost. The current packetis not acknowledged and the node recognises a sequence error.

NB. The receiver ignores the PSN in a link Reset or Total Reset packet.

NUL FRAMES

The node transmitter is permitted to insert NUL protocol frames within apacket anywhere after the first data frame. The receiver ignores NULframes by discarding them. NUL frames are not included in thecalculation of the CRC field. This facility is useful in the followingcases:

(i) If the transmitter has started to send a packet but the data neededto complete the packet is temporarily unavailable.

(ii) If the transmitter is still waiting for an ACK response when it isready to send the trailing FLAG of the next packet.

In order to guarantee frame synchronisation NUL's are not permitted whenthe link is idle. If the receiver detects a NUL frame and it has notdecoded a data frame since it received the last FLAG then it indicates aprotocol error.

Packets may be aborted if a node detects an internal hardware errorwhile it is transmitting a packet. This is achieved by inserting anillegal frame anywhere before the trailing FLAG.

Each node must provide at least one buffer for received packets. Thebuffer must be large enough to accommodate the longest packet that isdefined. The buffer is needed to allow the CRC field to be verifiedbefore the receiver transfers the data field to the application or actson the control and address fields. Since the unit of pacing is a packetthe buffer is also necessary to prevent overruns.

The source node must retain each packet until it receives thecorresponding ACK response. If there is no ACK then the Link ERP mayhave to retransmit the last one or two packets.

To achieve continuous communication at the full bandwidth of the link itis generally necessary for each node to have, a pair of transmit buffersand a pair of receive buffers. This provides `A/B` buffering. One bufferof each pair is filled/emptied by the link while the other isemptied/filled by the application.

BUFFER MANAGEMENT

The transmit buffers must be carefully managed to allow correct recoveryafter an error. It may be necessary to retransmit or discard the lastone or two packets that were transmitted just before an error. The linkhardware must maintain sufficient status to identify the bufferscontaining these packets and the order in which they were transmitted.If there are N transmit buffers and the transmitter always accesses themin a cyclic sequence then the following two pointers provide sufficientinformation:

TRANSMIT POINTER This points to the buffer that is to be transmittednext. It is incremented modulo N each time a trailing FLAG istransmitted.

RETRY POINTER This points to the next buffer to be acknowledged. It isincremented modulo N each time an ACK response is received while`waiting for ACK` is set. Normally it will follow the transmit pointerclosely but when an error occurs it may lag by up to 2.

LINK AVAILABILITY

The link hardware in each node can be in one of four avail abilitystates:

DISABLED. This is the power on state before the link is madeoperational.

ENABLED. This is a transient state on the way to making the linkoperational.

READY. This is the state for normal transmission and reception ofpackets.

CHECK. This state is entered when an error is detected. The link is notoperational until the Link ERP success fully returns the hardware to the`ready` state.

The current state may be inspected and changed by the node processor todetermine the state of the link and to enable and disable it. Thehardware state may also change automatically when certain events occur.

DISABLED STATE

In this state the transmitter outputs all zeros and the receiver onlyresponds to a Total Reset. The `disabled` state is entered automaticallyafter a Local Reset is performed or when a packet specifying Total Resetis received in any of the other states. It is also selected explicitlyduring the Link ERP.

To guarantee recognition by the remote node, the minimum duration of the`disabled` state is 5 frame periods.

ENABLED STATE

When a node is ready to begin communications the node processor willfirst check that the line driver and receiver are not indicating a linefault. It can then explicitly change the hardware state to `enabled`. Inthis state the transmitter outputs FLAG's and the receiver listens for aFLAG. When a FLAG is detected the link hardware automatically enters the`ready` state. The node processor may need to poll to detect thistransition or the hardware may provide an interrupt.

READY STATE

This is the state for normal communication.

In order to allow the remote node sufficient time to acquire bytesynchronisation, when a node first becomes `ready` it must transmit atleast 5 FLAG's before sending any other frames.

When a node first becomes `ready` the transmitter will send an RRresponse when at least 1 inbound packet buffer is available. Similarlyit will not send any packets until it has received an RR response.

CHECK STATE

This state is entered automatically when the hardware detects an erroror it receives a packet specifying Link Reset. The link is theninoperable until the Link ERP successfully returns the hardware to the`ready` state. The TRANSITION to the `check` state invokes the Link ERP.

When the hardware enters the `check` state the transmitter stops sendingdata packets after completing the current packet, if any. Thetransmitter then sends FLAG's continuously, except in the followingcases:

If the receiver instructs it to send a response.

If the node processor instructs it to send a Link Reset or a TotalReset.

The receiver discards any incoming packets, except if they specify LinkReset or Total Reset. The receiver also discards RR responses but ACKresponses are accepted and actioned.

In the `check` state the application suspends filling the transmitbuffers and emptying the receive buffers. This is to avoid transferringa bad received packet to memory.

WRAP MODE

Independently of the above states a node may be able to operate its linkhardware in the `wrap` mode. This is useful to perform a power on selftest (POST) of the local hardware. In the `wrap` mode the transmitteroutput is internally connected to the receiver input. This allows halfduplex communication using the normal protocol except that packets andtheir responses share the same line. The link hardware can be fullytested without needing a remote node.

During the `wrap` mode the out bound line is held at logic zero and thein bound line is ignored.

The `wrap` mode should be selected with care at any time after thePOST's since in some configurations if the node processor hangs it maythen be impossible to reset it.

BEGINNING COMMUNICATION

The link hardware is in the `disabled` state at power on. When a nodeprocessor wishes to begin communications it must take the followingsteps:

1. Check that the line interface circuits are not indicating a linefault. This would indicate that the remote node is not operational orthat the cable is disconnected.

2. Put the link hardware into the `enabled` state. This will cause thetransmitter to start sending FLAG's.

3. When FLAG's are received from the remote node the link hardware willautomatically change to the `ready` state.

4. When an RR response is received from the remote node the transmitterresets `waiting for RR`.

5. A packet can now be transmitted provided that at least 5 FLAG's havebeen sent since entering the `ready` state.

ENDING COMMUNICATION

Since the link has to be quiesced first the method of endingcommunication must be determined by the application. The followingexample is only intended to illustrate the steps that are necessary:

1. The node that wants to cease communications waits until the remotenode has responded to all of its outstanding requests. It then sends amessage requesting to shut down the link.

2. The remote node waits until the local node has responded to all ofits out standing requests and then it returns a message acknowledgingshut down.

3. Both nodes then disable their link hardware.

PHYSICAL MEDIUM MODULATION

Data is transmitted as a base band digital signal using the NRZI method.A `1` bit is signalled by inverting the state of the line. For a `0` bitthe state of the line is un changed.

CLOCKING The Serial Link operates synchronously. The receiver mustextract a suitable clock from the transitions in the transmitted data.

ENCODING

Synchronous clocking restricts the bit patterns that the transmitter canuse since it is undesirable to have long sequences of zeros. Hence anencoding algorithm is required to convert the arbitrary data that onemay wish to send into patterns suitable for transmission.

The serial link as described in the present application uses a 4B/5Bcode which guarantees that there will never be more than 3 consecutivezeros in the transmitted data stream.

The transmitter encodes every 4 input data bits into one of the 16 5 bit`data symbols` shown in Figure id `symbols` unknown. The 5 `controlsymbols` may also be used freely for link control functions. Some of the11 `restricted symbols` may also be used if care is taken to avoidviolating the clocking requirements.

DATA FRAMES

The following conventions are used in this description:

The bits in an unencoded byte are numbered 0 to 7 from left to right.

The bits in an encoded frame are designated a, b, c, d, e, f, g, h, j,k. Bit `a` is transmitted on the line first.

A 10 bit data frame is constructed by encoding each hexadecimal digit ofthe data to be transmitted according to the 4B/5B code. Bits 0 3 areencoded first, followed by bits 4 7. Thus, `23`x would be encoded as:

Bit: abcdefghjk

`23`x: 1010010101

PROTOCOL FRAMES

Protocol frames are constructed from a combination of 2 sym bols, atleast one of which is a control symbol. This guarantees that a protocolframe can always be distingished from a data frame.

Protocol frames that contain 2 control symbols provide added protectionagainst noise on the line. Since control symbols differ from datasymbols by at least one bit, such a frame will differ from a data frameby at least 2 bits. The availability of 5 control symbols provides forup to 25 such protocol frames.

Some frames can be constructed from one control and one retricted symbolthat still meet the clocking requirement of no more than 3 consecutivezeroes. One such frame is used for the FLAG. This particular frame hasbeen chosen because it does not occur in any phase of all possiblecombinations of data and control symbols. Therefore it permits thereceiver to acquire and verify frame synchronisation. The FLAG framealso contains relatively few transitions to miniraise RFI when the linkis idle.

Only the following 4 protocol frames are defined:

Bit: abcdefghjk

FLAG: 1000100100

ACK: 0110101101

RR: 1111111111

NUL: 1100111001

ILLEGAL FRAMES: A 10 bit frame results in 1024 possible bit patterns.Since 256 of these patterns are data frames and 4 are protocol frames,this leaves 764 patterns that are undefined. If a node receives anyundefined frame while it is in the `ready` state then it indicates an`illegal frame` error. The illegal frame `0000000000`B is of specialinterest. If .it is occurs consistently then it indicates that theremote node is in the `disabled state`. Therefore the receiver providesa `no frames` indication to allow the Link ERP to detect this condition,

The transmission and receipt of a packet of data will next be describedwith reference to FIGS. 2, 3 and 5.

FIG. 8 shows the components of one node of FIG. 1 connected via DMA andI/O buses to a microprocessor 10 which is in turn connected to a databuffer 12. The microprocessor contains the logic to address and controlthe data buffer. The microprocessor also includes a DMA FSM whichcontrols the transfer of data from the Data buffer into the packetbuffers of the link hardware. Details of the DMA transfer are notrelevant to the present application and are therefore not described. Inother systems employing the present invention other means may beprovided for transferring data for transmission into the packet buffers.In the described system, all data entering and leaving the link passesthrough the data buffer. The packet buffers are filled by data whicharrives on the links or in the described implementation by DMA whichfetches data from the data buffer. The I/O Bus connecting the I/Ointerface to the microprocessor is used by the microprocessor to accessa series of external registers implemented in the link logic. Inaddition the microprocessor can build message packets which aredifferent to the data packets in that they include message informationin the data field. The message packets are held in the outbound linkmessage buffer from where they are transmitted in a similar manner tonormal data packets. Message packets are used for commands, status andfor initiating data transfers.

FIGS. 2a and 2b are interconnected and show the main components of theinbound and outbound links with associated packet buffer RAM 20 andpacket status RAM 30. The A/B packet buffers for the inbound andoutbound links are contained in the packet buffer RAM and the packetstatus registers associated with each of the A/B buffers are containedin the packet status RAM. The packet status registers keep a count ofthe number of data bytes stored in the packet buffer and contain addressinformation. Each of the outbound and inbound packet buffers requires acorresponding packet status register (PSR). The packet status registersare 16 bits wide and each contains two fields:

(i) An 8 bit destination field: For outbound packets this contains avalue which will be copied into the address field of the outgoing packetwhen the corresponding packet buffer contents are transmitted by thelink. This value may be automatically loaded by hardware when the packetis being fetched into the packet buffer, in preparation fortransmission. For inbound packets, this field contains an addressextracted from the address field of the incoming packet. This value iswritten into the PSR by the inbound link FSM, and its value is used todetermine the packet's subsequent routing.

(ii) An 8 Bit Byte Count field: For outbound packets this contains avalue which indicates the number of bytes which have been placed in thecorresponding packet buffer. When the link transmits the packet, thisvalue has to be copied into a byte counter (part of the link hardware)which is decremented as each data byte is sent. The value in the PSR ispreserved in case the packet has to be transmitted due to an errorduring link transmission. For inbound packets, this field contains avalue which indicates the number of data bytes which were received inthe incoming packet (excluding the two CRC bytes).

FIGS. 3a and 3b show the various states and the transitions betweenstates of the outbound (Tx) FSMs. As described previously there are twoFSMs, one being a packet FSM (FIG. 3a) which controls the transmissionof packets and the other being a control FSM (FIG. 3b) which controlsthe transmission of ACK and RR responses. The transmission of a packetof data under the control of the Tx FSM will be described with referenceto FIGS. 2a and 2b which show in block diagrammatical form the Tx FSMand connected hardware.

The outbound packet status arbitration logic 52 continuously monitorsthe outbound packet status bits associated with each packet buffer todetermine if there is any data in the buffer which is ready to betransmitted. If so, the Tx packet FSM is notified by a pulse on line110. At the same time arbitration logic 52 pulses line 120 which causesthe byte counter 58 to load the valise stored in the 8 bit count fieldin the packet status register associated with the packet buffercontaining the data to be sent. The byte counter therefore contains avalue corresponding to the number of data bytes to be transmitted inthat particular packet. During packet transmission, the counter isdecremented as each byte is transferred from the data packet buffer. Apulse is put onto line 122 when the counter decrements to zero.

When the packet FSM receives the signal over line 110, it sets low theFLAG line between the Tx FSM and the encoder 82 which stops the encodertransmitting FLAG frames. The packet FSM then presents a request forcontrol information on line 101 (FIG. 2a and b). It will be rememberedthat the control field of a data packet contains 8 bit. For a normaldata packet (ie not link or total reset packet), the first six bits ofthe control field are set to 0. These 6 bits are obtained from a controlfield register 54, the signal on 101 causing MUX 56 to pass the six bitswhich are sent out on 104 connecting mux 56 with mux 66. The last twobits of the control byte which contain the packet sequence numberinformation are obtained from TSN register 70 an added to the 6 bitsfrom the control field register. The Transmit sequence number held inthe TSN register is incremented for each packet transmitted andspecifically when the packet FSM enters the `send control state`.

When the packet FSM enters the `send address` state, the require addressline 102 is pulsed which causes mux 56 to pass the 8 bit of addressinformation contained in the destination field of the associated packetstatus register. The address information is then passed onto line 104.Because of the exclusive nature of mux 56 only one of the control,address and data lines may be set at any one time and accordingly onlycontrol, address or data bytes are present on line 104 at any one time.

The packet FSM then passes from the `send address state` and checkswhether the byte counter is set to zero. If so there is no data to betransmitted in the packet data field and the FSM passes onto the `sendCRC1, 2` state. If line 122 does not indicate that byte counter is setto zero then there is data to be transmitted and the FSM enters the`Send data state`. The require data line 103 which causes mux to pass adata byte from the packet buffer over lines 203 and 104. Each time arequire data signal is sent the byte counter is decremented modulo 4.

A transmitted data packet also contains a CRC field which consists oftwo data frames. The CRC field is calculated in two 8 bit CRC registersin outbound CRC generator 68. Both registers are preset to all-ones atthe start of each packet, more specifically when when the Tx packet FSMenters the `send control` state. The CRC is then accumulated over thecontrol, address and data fields by the registers in the outbound CRCgenerator 68. When the last byte of data contained in the packet bufferis sent (when the byte counter 58 has decremented to 0) the FSM entersthe `send CRC1, 2` state in which the two CRC registers are encoded bythe encoder into 10 bit frames (CRC1 and CRC2) and transmitted. WhenCRC1 and CRC2 have been sent the FSM sets ACK TIMER 72 running. If apair of ACK frames for a previous data packet have not been received bythe inbound link then the FSM sets the NUL line between the FSM and theencoder high thereby causing the encoder to send out NUL frames. WhenACK frame is received before an ACK time out occurs, the transmission ofNUL frames is stopped and the FSM sets the FLAG line high which causes aFLAG frame, defining the end of the packet, to be sent. The whole packetsending process is repeated if there is more data waiting to be sent,otherwise the FSM causes FLAG frames to be sent continuously.

After the control, address, data and CRC fields have been encoded, thepacket passes through serialleer and is transmitted along the outboundtwisted pair by driver 86.

At any time during the transmission of a data packet, it may benecessary to send out a pair of response frames either RR or ACK.

Transmission of responses has priority over the transmission of thepacket so it is necessary to provide the means to interrupt data packettransmission. Sending of ACK or RR responses is controlled by the Txcontrol FSM which is normally in idle mode while data packets are beingtransmitted. When the control FSM receives a signal from the Rx FSM overline 107 or 108, the Tx packet FSM is interrupted and the control FSMstarted. Depending on whether the response required was an ACK or RR,the control FSM sets high the RR or ACK line between the Tx FSM and theencoder which then sends out a pair of 10 bit ACK or RR frames. Whenthis is done, the control FSM reenters the idle state and the packet FSMresumes transmission of the part processed data packet.

Next will be described the operation of the Rx FSMs in receiving packetsof data and response frames (RR and ACK) sent by a transmitter at theother end of the serial link. FIGS. 4a and 4b show the states of the twoRx FSMs (packet and control) and FIGS. 2a and 2b show the FSMs andconnected components. Referring to FIG. 4a, the Rx packet FSM isnormally sitting in the idle state and receiving pulses along line 140from deserialiser 94 indicating the receipt of FLAG frames. As notedbefore FLAG frames are sent continuously when no data packets are sentin order to maintain synchronisation at the receiver. The Tx packet FSMonly `wakes up` when it receives something other than a FLAG frame. Whena data frame (ie control, address or data frame) is detected by theinbound 4/5 decoder 92, the data line between the decoder and the Rxpacket FSM is pulsed which causes it to enter the `received controlstate`. The FSM pulses line 152 which resets byte counter 64. Counter 64is actually reset at 31 2, in order to compensate for the two CRC framesexpected at the end of the packet. The pulse on line 152 also resetswrite pointer 62. CRC accumulator is also preset when the FSM enters thereceived cntrl frame state. The `cntrl here` line out of the Rx FSM ispulsed. This tells the external logic that the data to be presented on`rx₋₋ data` 8 bit bus is the control frame that has just been received.The control frame is gated by 53 and the information held in register 55for access by the external logic. After the control frame has beenreceived, normally a second data frame is expected. However, it canhappen that the next frame received is a FLAG detected by thedeserialiser. This can occur if the control frame that was supposedlyreceived was caused by a fault on the incoming line. If a FLAG isreceived, the FSM indicates a protocol error. If the next frame is adata frame the data line between decoder and FSM is pulsed once againwhich causes the FSM to enter the `received address` frame state. TheFSM then pulses the `addr here` line 146. This tells logic outside thelink that the data on 8 bit bus 150 is an address frame. The addressframe is gated by 57 and the byte making up the address frame is heldtemporarily in register 59. The external logic looks at the address anddecides whether the address is valid. If not a pkt reject error isindicated.

After the address byte has been received, the next frame will be eithera data byte or a CRC byte, depending on whether there is a data field inthe packet. The FSM enters the `received data/crc state`. At this stagedata and CRC frames are indistinguishable from one another. The FSMpulses line 148 indicating the presence of a data frame. Byte counterand write pointer are incremented and the data frame is transferred intothe packet buffer. As data frames are received, the packet FSM goesround the receive data frame loop, each time a frame is received thebyte counter and write pointer are incremented. As each frame isreceived (including control and address frames) inbound CRC accumulator95 accumulates CRC over the incoming frames. When all data frames havebeen received, a FLAG is detected by the deserialiser. This causes theRx FSM to reenter the idle state until the next data packet begins tocome in. When the `received data` state of the FSM is exited on receiptof the end FLAG, the FSM pulses lines 154 indicating the last byte hasbeen received. In addition if the packet has been received withoutprotocol, CRC or other errors then the FSM pulses line 156. If no errorshave been detected and the CRC checksum is correct, then the Rx FSMpulses line 107 which freezes the Tx packet FSM if in the middle oftransmitting a packet and causes the Tx control FSM to send out a pairof ACK frames as described above. The RSN in register is alsoincremented. When 154 and 156 are pulsed, the count stored in bytecounter 64 is copied into the packet status register associated with thepacket buffer into which the data has been written. The address held inregister 59 is also copied in the destination field of the statusregister. After the address and count fields of the register have beenwritten the i/b packet full bit is set by a pulse on line 158 whichindicates to the external logic (ie the logic that uses the data thathas been received) that a packet has been correctly received and isready for access. If during receipt of a packet, an illegal frame isdetected by the decoder, the portion of the packet received up to thatpoint is discarded.

As described previously during full duplex operation of the link aresponse frame (RR or ACK) may be interleaved within a data packet.Accordingly when a packet is being received as described above, thedecoder may detect a pair of ACK or RR frames. The Rx control FSM whichis normally idling is started. When a RR frame is detected, the controlFSM enters the `rx RRI` state. If a second RR frame is detected (asshould be the case) the FSM enters the `Rx RR2` state. The FSM thenindicates that a valid RR response has been received and this causesline 160 to be pulsed thereby indicating to the packet arbitration logic52 that the remote node is ready to receive more data. The arbitrationlogic knows if there is data in the buffer to be transmitted and if soit begins packet transmission as described above. If the Rx control FSMreceives a pair of ACK frames, then it causes line 162 to be pulsed. Thearbitration logic indicates to the history log that a valid ACK responsehas been received. The history log knows what packets are outstandingthat require acknowledgment. The Tx packet buffer containing theacknowledged packet can then be cleared ready for more data.

ERROR RECOVERY PROCEDURE (ERP)

The architecture of the serial link defines the method to recoverTRANSMISSION errors at the packet level. Recovery is performed by aself-contained Link ERP using the existing transmit packet buffers. Thishas the following benefits:

The application software is simplified since recovery is transparent.

There is no need to terminate any operations when an error occurs.

There is no uncertainty about the state of the remote node.

The compatibility of different implementations is enhanced.

Note that HARDWARE ERRORS, such as parity checks, may not be recoverableat the packet level. Data may have been lost or unknown state changesmay have occurred. In this case the application must still perform therecovery. The operations in progress are terminated and repeated by ahigher level in the using system. This is an acceptable solution sincehardware errors are much less frequent than transmission errors.

The basic principles of the method of error recovery are as follows:

1. In normal operation (as described in detail above) the transmitterdoes not reuse a packet buffer until it has received an ACK response.This indicates that the packet has been received correctly by thedestination node. Therefore when an error occurs the affected packet(s)are still available for retransmission (because a node can begin sendinga second packet before receiving the acknowledgment for the first theremay be at most two packets that require retransmission).

2. When an error is detected both nodes enter the `check` state, invokethe Link ERP and exchange status by means of Link Resets.

3. Recovery is performed separately for each line. Each node isresponsible for recovering packets that were lost on its out-bound line.Because the transmitter is allowed to start sending another packetbefore it receives an ACK response, up to 2 packets may need to beretransmitted.

4. Before restarting communication the Link ERP forces the hardware intothe `disabled` state so that both nodes are in compatible states.

5. The link protocol and ERP are designed to minimise the chances oflosing or duplicating any packets when an error occurs. However theapplication should protect itself against these events whereverpossible. For example, the byte count can be checked for zero at the endof a data transfer and time-outs con be used to detect lost messagepackets.

LINK ERRORS

Except where explicitly stated the following errors are only indicatedwhen the link hardware is in the `ready` state prior to the error. Inall cases the hardware will enter the `check` state and interrupt thenode processor. Except for resets no further packets are accepted oracknowledged until the hardware returns to the `ready` state. Errors aregenerally ignored if the hardware is not in the `ready` state.

ACK TIME-OUT: this is indicated when the source does not receive an ACKresponse within the specified time of sending the trailing FLAG of apacket other than a Total Reset. The affected packet remains in thetransmit buffer for possible retransmission by the Link ERP.

ILLEGAL FRAME: this error is indicated if the receiver decodes a framewhich is not one of the 4 protocol frames or one of the 256 data frames.

PROTOCOL ERROR: this error is indicated when a node receives an invalidor unexpected sequence of frames as listed here:

1. A short packet with less than 4 data frames between 2 FLAG's. Thismay be caused by noise corrupting or manufacturing a FLAG.

2. A node receives a control field that does not specify a reset and nobuffer is available, ie. when `RR pending` is set.

3. An unexpected ACK response, ie. when `waiting for ACK` is reset.

4. An isolated ACK frame. If an ACK response is corrupted thetransmitter will also detect an ACK time-out.

5. An isolated RR frame.

6. A NUL frame with no intervening data frame since the last FLAG.

One half of the link will hang if an RR response is lost without anyerrors being detected, eg. if the RR's are changed to FLAG's while thelink is idle. This is extremely unlikely and therefore no recovery isprovided at the link level. Instead the application should provide atime-out for the operation in progress.

CRC ERROR: this error is indicated when a received packet has bad CRCand none of the errors above occurred.

SEQUENCE ERROR: this error is indicated when a received packet has PSNnot equal to RSN, none of the errors above occurred and the packet doesnot specify a reset. A previous packet has probably been lost.

PACKET REJECT: this error is indicated when a packet is receivedcorrectly with none of the errors above but the packet is unacceptablefor any of the following reasons:

1. The packet is too long to fit in the available buffers. Note that thereceiver must continue to accumulate the CRC after the buffer hasoverflowed in order to verify that there hasn't been a transmissionerror, eg. a corrupted FLAG.

2. The packet length is otherwise unacceptable to the implementation,eg. odd when it must be even.

3. The user-defined bits in the control field are not acceptable to theimplementation.

4. The address field specifies a destination that is currently invalidor not implemented and the control field does not specify a reset.

Errors in this class are due to programming, synchronisation orcompatibility problems. The Link ERP does not retry them. Instead theapplication will be alerted via an ERP exit so that it can retry orterminate the operations in progress.

LINE FAULT: this error is indicated when the line driver or linereceiver detects an invalid voltage and the link hardware is not in the`disabled` state. The cable may be open or short circuit or the remotenode may be powered off.

HARDWARE ERROR: this error is indicated when a node detects an internalhardware error, eg. a parity check, The Link ERP will not retry errorsin this class. Instead the application will be alerted via an ERP exitso that it can retry or terminate the operations in progress.

LINK STATUS BYTE

During error recovery the Link ERP in each node builds a Link StatusByte and sends it to the other node in the address field of a Link Resetpacket. FIG. 9 shows the format of the link status byte.

    ______________________________________                                        H/W ERROR     When `1`, this bit indicates that the node                                    detected an internal hardware error.                            LINE FAULT    When `1`, this bit indicates that the node                                    detected a line fault on either the in-                                       bound or the out-bound pair. It is                                            provided for information only and it is                                       not referenced by the Link ERP in                                             the destination node.                                           ACK T/O       When `1`, this bit indicates that the                                         transmitter timed-out while waiting for                                       an ACK response. It is provided for                                           information only and it is not referenced                                     by the Link ERP in the destination                                            node.                                                           RECEIVER ERRORS                                                                             This field contains a 3-bit code to                                           identify the first error detected by                                          the receiver:                                                               000   No error                                                                001   Illegal frame                                                           010   Protocol error                                                          011   CRC error                                                               100   Sequence error                                                          101   Packet reject                                               ______________________________________                                    

When two or more errors occur simultaneously the lowest number isreported.

    ______________________________________                                        RSN    This is the receive sequence number for the last                              packet that was acknowledged by the node,                                     excluding Link Resets. It is needed by the Link                               ERP in the remote node.                                                ______________________________________                                    

How the link status byte is compiled when an error is detected isdescribed in detail below.

Error recovery is symmetrical for both nodes. When an error occurs bothnodes will enter the `check` state and invoke the Link ERP. It isexpected that the Link ERP will normally be implemented in softwarerunning on the node processor. However the functions could conceivablybe performed by a hardware FSM if performance is critical.

If the ERP determines that a transmission error occurred then itattempts to recover the error itself. If recovery is successful the LinkERP terminates and the application continues unaware of the error.

The ERP cannot recover some errors transparently, eg. hardware errors orpermanent line faults. In these cases the ERP exits to the application,which should then perform a reset and abort the operations in progress.The ERP is carefully designed so that both nodes always recognise anunrecoverable error and remain synchronised.

Note that the time intervals included in this description are forillustrative purposes only, in practice they are dependent on theapplication and implementation.

The first (or only) node that detects the error enters the `check` stateand invokes its Link ERP, The Link ERP functions as follows:

1. The ERP waits until the transmitter has finished sending the currentpacket, if any.

2. The ERP then builds the Link Status Byte by reference to thehardware.

3. If the line driver or receiver have detected a line fault then theERP tries to reset the error. If this fails then the application isalerted via an ERP exit (`Permanent line fault`).

4. The ERP checks whether the receiver bas indicated a `no frames`error. If so, the remote node may have powered off or it may haveentered the `disabled` state. The application is alerted via an ERP exit(`Remote node disabled`).

5. The ERP saves the local TSN which is held in TSN register 70 for uselater.

6. The ERP instructs the transmitter to send a Link Reset packetcontaining the local Link Status Byte to the remote node. The remotenode should now enter the `check` state, if it has not already done so.Either way it will invoke its Link ERP and return a Link Resetcontaining the remote Link Status Byte.

7. The ERP waits to receive an acknowledgement to the Link Reset that itsent. It also waits to receive a Link Reset from the remote node. If anACK time-out occurs, or no Link Reset has been received after 1 ms, thenthe ERP sends another Link Reset. If an ACK time-out occurs, or no LinkReset has been received after a further 1 ms, then the application isalerted via an ERP exit (`Link Reset failed`).

8. The implementation must protect against the ERP looping if there is apermanent error. Since both nodes are always involved in error recoveryit is sufficient if only one node provides this protection, eg, theupper node in a hierarchical system. The following is an example of onemethod that can be used. Each invocation of the ERP increments a retrycounter that is reset to zero periodically by a timer. If the number ofretries in one period of the timer exceeds some maximum value then theERP waits 10 ms to ensure the remote node recognises that retry is beingaborted. The application is then alerted via an ERP exit (`Retry limitexceeded`). This scheme also protects against excessive use of the ERPin the event of severe external noise.

9. If either node has detected a hardware error then the application isalerted via an ERP exit (`Hardware error`). The ERP exit also indicatesthe node that detected the error. (Local node, remote node or both.)

10. If either node has indicated `packet reject` then furthercommunication may be meaningless. The application is alerted via an ERPexit (`Packet rejected`). The ERP exit also indicates the node thatdetected the error. (Local node, remote node or both.)

11. Otherwise the ERP calculates the number of packets that have beensent but not acknowledged,

    Q=(Transmit.sub.-- pointer--Retry.sub.-- pointer) modulo N

where N is the number of transmit buffers that are provided. Q should be0 or 1 packets. The ERP also calculates the number of packets that havebeen transmitted but not received,

    P=(Saved.sub.-- local.sub.-- TSN--Remote.sub.-- RSN) modulo 4

P should be less than or equal to Q.

If either of these checks fails the ERP waits 10 ms to ensure that theremote node recognises an unrecoverable error. The application is thenalerted via an ERP exit (`Invalid retry status`).

12. Otherwise the ERP arranges to resend the lost packets by subtractingP from its transmit pointer, modulo N.

13. Those out-bound buffers that do not need to be retransmitted mustnow be discarded using the following algorithm:

    Do while Retry.sub.-- pointer≠Transmit.sub.-- pointer;

Deallocate buffer at Retry₋₋ pointer; Increment Retry₋₋ pointer moduloN; End;

14. If the node has received a packet containing any of the `receivererrors` in the Link Status Byte then it must be discarded. Theappropriate in-bound buffer may be deallocated automatically by thereceiver hardware or the ERP may have to do it explicitly. Otherwise theERP does not need to deal with the in-bound buffers. If any are fullthey will be emptied by the application.

15. The ERP disables the link and resets all of the latches for hardwareerrors, ACK time-out and receiver errors.

16. The ERP waits until the remote node enters the `disabled` state, asindicated by the `no frames` signal from the receiver. This is requiredto synchronise the two Link ERP's and prevent the transmitter sending anRR response while the remote node is still in the `check` state.

If the receiver does not indicate `no frames` within 1 ms theapplication is alerted via an ERP exit (`Time-out waiting for disabledstate`). The remote node may have detected an unrecoverable link error.

17. Otherwise the ERP enables the link.

18. The ERP waits for the link to become `ready`. This indicates thatthe remote node has completed its recovery. In a hierarchical system thelower node may wait indefinitely for the `ready` state. Alternatively atime-out can be provided as follows. If the link does not become `ready`within 1 ms the application is alerted via an ERP exit (`Time-outwaiting for ready state`). This may indicate that the remote node haspowered off or encountered a Type 1 error.

19. Otherwise the ERP terminates successfully.

After each node becomes `ready` it will send an RR response to the otherwhen it has at least 1 in-bound buffer available. This will reset`waiting for RR` and allow transmission of any pending packets.

FIG. 7 illustrates one example of the operation of the I, ink ERP. Thelocal node sends 2 packets back-to-back. The remote node receives themcorrectly but the ACK response for the first packet is corrupted. Thelocal node then detects an illegal frame. No packets need to beretransmitted since only the ACK response was lost. To illustrate theoperation of the transmit pointer (TP) and the retry pointer (RP) it isassumed that the local node has 4 transmit buffers. The remote node has2 receive buffers. P and Q values are those calculated according to theequations given earlier.

The detection of errors and the resultant compilation and transmissionof a link reset packet including link status byte will now be describedin greater detail with reference to FIGS. 2a and 2b.

When an error is detected by the outbound link during packettransmission or by the inbound link during packet reception, an errorline is pulsed (e.g. line 141 exiting the receiver indicating a receiverline fault or line 161 exiting the Rx FSM indicating a pkt rejecterror). The pulse causes the appropriate bit in one of two registersassociated with the link hardware to be set. These two registers are theLink Error Register and Link H/W error register. The Link error registeris an 8 bit register which is used to indicate most of the detectableerrors i.e. illegal frame, protocol error, CRC error, packet reject, ACKtime out, no frames and line fault. The Link hardware error registerindicates the type of hardware error detected. A third register shown inFIG. 9 is the Link Status Register. Two bits of this register indicatethe value of the transmit sequence number (TSN) which is maintained inthe TSN register 70 of the outbound link. The value is incremented forevery packet sent. The value of the TSN is used during error recovery(as described above) in the process of calculating how many packets needto be retransmitted. A further two bits of the Link status registerindicate the value of the receive sequence number (RSN) which ismaintained by the RSN register 93 of the inbound link. This value isincremented for every packet received (ie for every ACK response sent).This value is frozen when the link enters the check state when an erroris detected.

The Link ERP invoked on detection of an error compiles the link statusbyte with reference to these three registers. The link ERP is controlledby the microprocessor of FIG. 8. Each of the bits of the Link StatusByte is copied from one of the three above described registers. Areceiver error (illegal frame, protocol error, sequence error, crc erroror packet reject error is indicated in the link status byte by the threebit code described above.

The link status byte so compiled is loaded by the microprocessor intothe destination field of the packet status register associated with theoutbound link message buffer. The count field of the PSR is set at zero.Line 204 in FIG. 2a is pulsed to indicate that a reset packet is readyto be sent. The link reset packet is then transmitted by the outboundlink hardware in substantially the same way as a data packet, as isdescribed in detail above. However, the control frame requested by theTx packet FSM is obtained from line 204 and has the appropriate bit setto indicate either a link reset or a total reset. The address field isobtained in the normal way from the destination field of the packetstatus register which in the case of link resets contains the compiledlink status byte. There are no bytes of data in the message buffer sothe Tx FSM completes the packet with the two CRC bytes and trailing endflag. Thus using the method described above, little extra logic in thelink hardware is required in order to transmit the link reset packetwhich is transmitted in the same way as a normal data packet.

When the link reset packet is correctly received by the inbound link ofthe remote node, it is acknowledged in the same way as a normal packet.The address field comprising the link status byte is loaded into thepacket status register of the inbound packet buffer ready for access bythe link ERP. The control field indicating that the packet is a resetpacket is held in latch 59 where it is accessed by external logic. Whenthe reset packet has been received, the Link ERP in the remote node isinvoked which causes a link status byte to be compiled and transmittedin the same way as already described.

Each node then looks at the error information contained in the linkreset packet that it receives. The way the node acts in response to eachtype of error depends on the system connected to the node. As describedpreviously if a hardware error or packet reject error is detected theERP does not attempt to recover the error by retransmission of packets.Instead it alerts the application by means of an ERP exit. Otherwise theERP calculates the number of packets that have been sent but notacknowledged. The History log 51 contains this information. The ERP alsocalculates the number of packets that have been transmitted but notreceived. This value is obtained by subtracting the RSN value containedin the link status byte from the TSN value in the local node which wasfrozen when the node entered the link check state. The node thus knowshow many packets need to be retransmitted. As described previously, thenode discards those packets which were received by the remote node.

We claim:
 1. A data communication system includingtwo nodes connected bya serial link over which data is transferred between the nodes inpackets of a predefined format, error detection means in each node fordetecting errors, and transmission error recovery means in at least oneof said nodes responsive to detection of an error by the error detectionmeans of that node to cause that node to send an error message to theother node including a sequence number indicative of the last packetreceived by said at least one node, and to receipt of an error messagefrom the other node to cause said at least one node to send its errormessage to the other node, each node being arranged to determine fromthe error message from the other node the number of packets, if any,that were not correctly received by the other node and to retransmit themissing packets.
 2. A system as claimed in claim 1 in which each nodeincludes multiple packet buffers for storing packets to be transmittedover the link or received on the link.
 3. A data communication systemincludingtwo nodes connected by a serial link over which data istransferred between the nodes in packets of a predefined format, errordetection means in each node for detecting errors, and transmissionerror recovery means in at least one of said nodes responsive either todetection of an error by the error detection means of that node to causethat node to send an error message to the other node including asequence number indicative of the last packet received by said at leastone node, or to receipt of an error message from the other node to causesaid at least one node to send its error message to the other node, eachnode being arranged to determine from the error message from the othernode the number of packets, if any, that were not correctly received bythe other node and to retransmit the missing packets, each node includesmultiple packet buffers for storing packets to be transmitted over thelink or received on the link, and each node includingmeans fortransmitting an acknowledgement message to the other node upon receiptof a packet, transmit pointer means for indicating from which packetbuffer the last packet was transmitted, and acknowledgement pointermeans for indicating from which packet buffer the last packet to havebeen acknowledged was transmitted.
 4. A system as claimed in claim 3including means for calculating from the transmit pointer means, thenumber of packets sent which have not been acknowledged,and means forcomparing this number with the number of packets to be retransmitted,thereby to identify packet buffers containing packets not required to beretransmitted.
 5. An error recovery method for use in a datacommunication system, said data communication system comprising twonodes connected by a serial link, data being transferred between thenodes in packets of a predefined format, said method comprising thesteps of:monitoring the system for errors with each node; sending afirst message with a first node to a second node, in response todetecting an error by said first node, said first message including asequence number indicative of the last packet received by said firstnode; receiving said first message with said second node; sending asecond message with said second node to said first node, in response tosaid step of receiving said first message, said second message includinga sequence number indicative of the last packet received by said secondnode; receiving said second message with said first node; determiningwith each node, from the respective one of said first and secondmessages received from the other node, the number of packets, if any,that were not correctly received by the other node; and retransmitting apacket if said determining step determines that the packet was notcorrectly received.
 6. The method of claim 5, further comprising thesteps of:sending an acknowledgement with each node to the other nodeupon receipt of a packet; calculating, with each node, the number ofpackets it has sent which have not been acknowledged; and comparing,with each node, the number produced by said calculating step with thenumber of packets to be retransmitted, and discarding surplus packetswhen the number of packets to be retransmitted is greater than thenumber produced by said calculating step.
 7. The method of claim 6,wherein each said packet comprises a plurality of predefined fields,each field consisting of one or more multibit data frames, and whereinthe flow of data packets is controlled by means of multiple bit controlframes distinguishable from the data frames, the control frames beingtransmissible independently of the data packets.
 8. The method of claim7, wherein the acknowledgment is one of the control frames.
 9. Themethod of claim 5, wherein said step of retransmitting a packet is notattempted upon the detection of certain predetermined errors.
 10. Themethod of claim 5, wherein each said packet comprises a plurality ofpredefined fields, each field consisting of one or more multibit dataframes, and wherein the flow of data packets is controlled by means ofmultiple bit control frames distinguishable from the data frames, thecontrol frames being transmissible independently of the data packets.